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Abstract 

We propose different approaches of PDF files based steganography, 
essentially based on the Chinese Remainder Theorem. Here, after a 
cover PDF document has been released from unnecessary characters 
of ASCII code AO, a secret message is hidden in it using one of the 
proposed approaches, making it invisible to common PDF readers, 
and the file is then transmitted through a non-secure communication 
channel. Where each of our methods, tries to ensure the condition 
that the number of inserted AO is less than the number of characters 
of the secret message s. 

Keywords: Steganography, PDF files and readers, Chinese Remainder The¬ 
orem. 


1 Introduction 

Steganography consists in hiding a secret message in pnblic docnment acting 
as a covert, in a way that sent throngh a non-secnre commnnication channel, 
only the sender and the receiver are able to nnderstand it, and anyone else 
cannot distingnish the existence of an hidden message. It is one of the Infor¬ 
mation hiding techniqnes as showed on fignre 1, where Lingnistic Steganog¬ 
raphy is defined by Chapman et al [T] as, “the art of using written natural 
language to conceal secret messages”, and Technical Steganography is defined 
as a strnctnre rather than a text, that can be represented by any physical 
means snch as invisible inks, microdots [1]. Most of the work in steganog¬ 
raphy has been done on images, video clips, mnsic, sonnds and texts. Bnt, 
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Figure 1: Classification of information hiding techniques 


text steganography is the most complex, due to the lack of redundant infor¬ 
mation in text hies, whereas lot of redundancy is present in image or sound 
hies, leading to a high exploitation of those hies in steganography [2]. 

There are several approaches encountered in the literature regarding the text 
steganography such as, line shift, word shift, syntactic methods, etc. Subse¬ 
quently we focused on the steganography based on PDF hies. 

2 PDF files based Steganography 

PDF, created by Adobe Systems [3] for document exchange, is a hxed-layout 
format for representing documents in a manner independent of the applica¬ 
tion software, hardware, and operation system. PDF hies are frequently used 
nowadays and this fact makes it possible to use them as cover documents in 
information hiding. Studies using these hies as cover media, are very few. 
Our proposal is based on the work of I-Shi et al. |6], in which secret data are 
embedded at between-word or between-character locations in a PDF hie, by 
using the non-breaking space with American Standard Code for Information 
Interchange (ASCII) code AO. I-Shi et al. [B] found in their study that, the 
non-breaking space {AO) is a character when embedded in a string of text 
characters, becomes invisible in the windows of several versions of common 
PDF readers, and use that phenomenon for data hiding. They showed two 
types of invisibility, based on the ASCII code AO. 

The hrst one is created by specifying the width oi AO appearing in the 
PDF reader’s window to be the same as that of the original white-space 
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represented by the ASCII code 20. The width of an ASCII code, is the width 
of the character represented by the code as displayed in a PDF reader’s 
window. Snbseqnently, AO and 20 become white-spaces. Their approach 
based on this first type of invisibility called alternative space coding, uses AO 
and 20 in a PDF text alternatively as a between-word space to encode a 
message bit b according to the following binary coding technique: 

if b = 1; then replace 20 between two words by Ad; 
if b = 0; make no change. 

This approach has the advantage of incurring no increase of the PDF file 
size because it just replaces the space exhibited by the code 20 by another 
exhibited by the code AO. However, if the between-word locations in a PDF 
page are few, then only a small number of bits may be embedded. 

The second one is created by setting the width of the ASCII code Ad to 
be zero in a PDF page. They found in their study an Ad doesn’t appear in 
a PDF reader’s window just like if it was nonexistent. Their approach called 
null space coding, given a message character C, embeds it at a location L as 
follows: 


if the index of C as specihed in Table 1 is m, 
then embed m consecutive Ad’s at location L. 

In this approach they presented. Table 1 [6] contains ASCII codes se¬ 
lected for message representations in their study, each one indexed with an 
integer value. 

The advantage of this approach is that the number of between-character 
locations are higher than the between-word locations. This makes the effi¬ 
ciency of the null space coding much higher. But an obvious disadvantage is 
that the resulting PDF file size will be higher than the original one (the one 
without Ad’s embedded in it). 

Our work is based on this last type of invisibility described by I-Shi et 
ai, such that our problematic is to reduce the weight difference between the 
cover PDF hie and the stego PDF hie resulting from the embedding process, 
while increasing the embedding capacity of the cover PDF hie. In order to 
reduce considerably the risks of detecting a cover communication based on 
the hie size. 
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3 Our Contribution 


Given a secret message s to be conceal in a cover text message, the null space 
coding developed by I-Shi et ai, proceeds as follows: 

• Firstly, s is compressed using the Huffman coding, where at the end a 
hie, containing a table where each line has a letter of s followed by a 
value, is generated; 

• Secondly, for each character of s a number of HO’s is inserted in the 
cover text equivalent to the value generated by the Huffman coding for 
that character, thus producing a stegotext. 

• Thirdly, the hie and the stegotext are transmitted through a non-secure 
communication channel. We note that two hies (the hie containing 
Huhman codes for the characters of the secret message and the PDF 
hie resulting from the embedding method) are transmitted. 

Their method cannot guarantee that the number of embedded HO’s is less 
than the number of characters of s or at least if s grows higher, the number 
of inserted HO’s won’t explode. 

Our aim is to propose diherent approaches, based on the Chinese Re¬ 
mainder Theorem, which their goal is to attain the above conditions and 
transmit one and only one hie (more precisely only the stegotext), through 
a non-secure communication channel. 


4 Chinese Remainder Theorem 


Theorem 1 Let be a pairwise relatively prime family of positive in¬ 

tegers, and let ai,...,ak be arbitrary integers. Then there exists a solution 
X E Z to the system of congruence 


' X = ai 

mod ni 

X = 02 

mod n 2 

= Qj}^ 

mod Uk 


Moreover, any a' E Z is a solution to this system of congruence if and only 
if a = a'{mod N), where N = HLi 
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Given a* and rij, {1 < i < k), we present the classic method of construc¬ 
tion of X from Qi and Hi as follows: 


We hrst construct integers e*, (1 < f < fc), such that for i, j = 1, • • • , fc, 
we have: 




Then setting 


Allows to see that for j = 1, 


1 mod rii, if j = i 
0 mod rij, if j i 

E k 
i=\ 

■ ,k we have 


( 1 ) 


X = Yli=i = Cbj mod 


Ua 


As all the terms in this sum are zero modulo tij, except for the term i = j, 
which is congruent to ttj mod rij. To construct e*, {I < i < k), satisfying (1), 
let us dehne 6* = N/rii, which is the product of all the moduli rij with j i. 
Then, q and e* are dehned as follows: q = {hi)~^ mod n, and e* = fcjCj. 

Garner’s algorithm is an efficient method for determining x, 0 < a < iV, 
given a{x) = (ai,a 2 , the residues of x modulo the pairwise co-prime 

moduli ni,n2, ■■■,nk [9]. 

Garner’s algorithm for CRT [9] 

Input: a positive integer M = > 1) with gcd{rni,mj) = 1 for all 

i 7 ^ j, and a modular representation v{x) = (ui, ^ 2 , • • • , Vt) of x for the m,. 
Output: the integer x in radix b representation. 

1. For i from 2 to t do the following: 

1.1. Ci ^ 1 . 

1.2. For j from 1 to {i — 1) do the following: 

u mj^mod m, 

Ci u X Ci mod mi 

2 . M ■(— Ui, a: M. 

3. For i from 2 to t do the following: 

u {vi — x) X Ci mod mi.; x x + u x 
f. Return(x). 

Time Complexity: 0{n^) 

This theorem is highly useful in a many contexts as, randomized primality 
test, modular arithmetic, secret sharing, etc. 
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5 Preprocessing on the cover file 

The PDF file f E F, that would be used as cover, needs to be cleansed of all 
AO’s contained in it. Meaning, going from the beginning of the hie to its end, 
if we cross a AO with size different from 0, we replace it by a space character 
(ASCII code 20), and if we cross a AO of size 0, we remove it, as presented 
by the following fnnction. 

Input: f: cover PDF hie 

Output: f; cover PDF hie with no sequence of more than one AO 

1. Open the file f; 

2. Browse the PDF file f character by character and 
for each encountered AO do: 

2.1 If (sizeof{A0) > Oj then replace AO by a space character; 

2.2 else remove AO from f; 

3. Save and close the file f; 

4- Return f; 

Where, sizeof{A0) is a function that retrieves the width of the non¬ 
breaking space character, if exists, set in a cover PDF hie /. 

Time Complexity; 0(|/|) 

The reason why we apply this procedure on a cover PDF hie, is to ensnre 
that the hie has not been modihed by a steganographic technique based 
on ASCII code AO; and also, as AO by default has the width of the space 
character, it can be replaced by it, all this to avoid ambignity between AO 
inserted by onr techniqnes and those found initially in the cover hie. 


6 Presentation of the different approaches 

For the sender and the receiver to be able to commnnicate through a non- 
secure channel, they have to agree on a secret key that would be use to 
encrypt a secret message, that would be send one to another. Regarding our 
approaches, the key /c G N, represents the number of bits (block length) in 
which a secret message s G {0,1}* would be split into before its encoding. 
And it’s previously selected by the sender and the receiver and shared through 
a secure channel. Snbsequently |s| denotes the length of the string s. 
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6.1 First Approach 

6.1.1 Hiding method 

We denote s the secret message, an integer k a secret key and / a cover PDF 
file. Without loss of generality, we assume that the length of s is a multiple 
of k. The hrst approach proceeds as follows: 

Input: s: secret message; k: secret key; f: cover PDF hie. 

Output: f; cover PDF hie with embedded dd’s 

Step 1 : two co-primes Pi,P 2 , are computed from k such that. 

Pi = 2^11; p2=pi + 1. 

Step 2; s is split in n blocks of length k stored the matrix sp such that: 
= 'S[(* ~ 1)^ + i]) 1 < * < ''^5 1 < J < 

Step 3 : each line of sp corresponding to a binary sequence, is transformed 
in its decimal value dec[i] such that, 

dec[i] = k — j + 1] X 1 <i <n. 

Step 4 : for each decimal value dec[i] (1 < i < n), two remainders r[l,i] 
and r[2,i], are computed such that 

r[l,i] = dec[i\ mod pi and r[2,i] = dec[i\ mod p 2 , I < i < n 

Step 5 : each r[j,i],{l < j < 2 and 1 < i < n), obtained from the 
previous step is transformed in its binary value stored in a matrix hinr bit 
by bit, such that: 

hinr[{{i — l) x2-|-j), 1] ■ ■ ■hinr[{{i — l) x2-|-j), fl]] = binDecomp{r[j,i], [|]), 

1 < j < 2 and 1 < i < n 

Where, binDecomp{r[j,i], [|]) is a function that returns the binary decom¬ 
position of a remainder r[j,i] on [|] bits of length. 

Step 6: Add a column at binr, the number of columns would then move 
from [|] to (1 -|- [|]); and for each line add a control bit at the end as shown 
by the following: 

1. for (i := 1 to (2xn - 1)) do binr[i, (1 + [flj/ ■'= 0; 

2. hinr[2n, (1 + \\~\)] := 1; 

Step 7: each line of binr is embedded in a cover PDF hie /, as described 
by the following: 
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1. Get the first between-character location Ic; 

2. for (i := 1 to 2xn) do 

2.1. for (j := 1 to (1 + \\~\)) do 
begin 

2.1.1. if (6mr[i, j] = 1) then Insert AO at Ic in the file f; 

2.1.2. Get the next between-character location Ic 
end; 


The control bit is there to help, during the recovery procedure, to know 
when to stop looking for embedded blocks in the cover hie. 

Time Complexity; 0{n*k) 

6.1.2 Recovery method 

To recover secret message from a stego PDF hie encoded with the above pro¬ 
cedure, the binary sequences encoded with AO’s in the hie must be recover at 
hrst, then remainders that produced those sequences, and with the k, com¬ 
puter the values related to those remainders, as described by the following 
procedure: 

Input: f; stego-PDF hie, k: secret key 
Output: s: secret message 

Step 1 :two co-primes Pi,P 2 , are computed from k such that. 

Pi = 2^11; p2 = Pi + 1. 

Step 2 ; retrieve the diherent lines of binr as follows: 

1 . i := 1; 

2. exist := true; 

3. n := 0; 

4 . Get the first couple of characters (a, b) from f; 

5. while (exist and !feof(f)) do 

begin 

5.1. j := 1; 

5.2. while (j < (1 I- \^])) do 

begin 

if (a != AO and b != AO) then binr[i,j] := 0; 
else 

if (a!= AO and b = AO) then binr[i,jj := 1; 
else 
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if (a = AO and b = AO) then exist = false; 
else j := j - 1; 
endif; 
endif; 
endif 

J ■■= J + 1; 

c := the next charaeter in f; 
a := b; 
b := c; 
end; 

5.3. i := i +1; 
end; 

6. n := i - 1; 

Step 3 : remove from binr the (1 + column, corresponding to the 

control bit’s column. 

Step 4; compute each r[j, f], (1 < J < 2 and 1 < i < n) from each line 
of binr such that: 

r[j,i] = binr[i, \h~\ — / i] x 1 <i <n. 

Step 5; compute each dec[i], 1 < i < n) using Garner’s algorithm such 
that: 


dec[i] = GarnerAlgorithm{{pi, P 2 }, {r[l, i],r[2, i]}) 

1 < i < n. 

Step 6: transform each dec[i] in its binary seqnence sp[i,j], (1 < j < k) 
bits snch that: 


{dec[i ])2 = sp[i, l]sp[i, 2] ■ ■ • sp[i, k] 

k bits 

Step 7: merge all the binary string into one, the secret s, such that: 

s[(i — l)k + j] = sp[i,j], 1 < * < 1 < j < 

Where Garner Algorithm take as input a list of co-primes Pi,P 2 , a list of 
remainders r[l,f],r[2,f], and ontputs a unique value dec[i]. 

Time Complexity: 0{n* k) 
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6.1.3 Evaluation 


In this approach, for each block of length k, 2 remainders ri, r 2 are computed 
respectively from pi and p 2 . As pi < P 2 , we can easily deduce that, the 
number max of inserted AO’s from a remainder is: 

log2{Max{ri,r2)) = log2{pi) = [fl. 

Thus, the number max of AO’s that can be inserted for a block of s is k. 

So, to embed a full secret message s divided into n blocks of length k, the 
maximum number of AO’s that would be needed is: 

n * k + 1 = |s| + 1 > |s|. 

We add 1 here because, for the last computed remainder, a AO character 
would be inserted at the end of the hiding procedure, serving as ending point 
for the recovery method. From, all these comes out the following theorem. 

Theorem 2 . 

Given a secret message s, a secret key k such that number of blocks of length 
k, is given by n = -y, and two primes pi,p 2 such that pi = 21 ^ 2 ^, P 2 = Pi + 1, 
the number N of AO’s insertions at between-character locations to perform 
in a PDF file, is: 


AT < |s| + 1 


Where N depends on the number of bits having value 1, contained in the 
secret message s’s computed remainders. 

6.2 Second Approach 

6.2.1 Hiding method 

In this particular approach, what would be considered as key is not k the 
block length, but m, a value that allows to compute primes between 2 x m 
and 3 X m, such that the base 2 logarithm of the product of all those primes 
gives us the block length k, in which a secret message s would be divided in. 
Those primes allows us to compute remainders, which their values would be 
used to compute position where one AO would inserted. The whole procedure 
is dehned as follows: 

Input: s: secret message, m: secret key, f: cover PDF hie 
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Output: f: stego-PDF file 


Step 1: compute primes Pi,P 2 , ■ ■ ■ Pt such that: 


2xm<pi<p2<‘‘'<Pt<3xm 

where, t is the number of primes computed between 2 x m and 3 x m. 

Step 2; compute the block length k such that: 

k = \_log 2 {prod) \ 

where, prod = nliP. 

Step 3: s is split in n blocks of length k stored the matrix sp such that: 

^p[hj] = '^[(i — l)k + j], I < i < n,l < j < k. 

Step 4 : each line of sp corresponding to a binary sequence, is transformed 
in its decimal value dec[i] such that, 

dec[i] = Yl^j=i ^P[k k — j + 1] X 1 < i < n. 

Step 5 : for each decimal value dec[i] {I < i < n), remainders r[l, i],r[2,i], ■ ■ ■ , r[t, i], 
are computed such that 

r[j, i] = dec[i] mod pj, 1 < i < n, 1 < j < t 

Step 6: for each remainder r[j, i], 1 < f < n, 1 < j < t, we compute 
positions pos[l, 1], • • • ,pos[t,n\, as described by the following procedure: 

1 . I := 0; i := 1; n := dec.length; h := t x pt; 

2 . while (i < n) do 

begin 

2 .1. for (j := 1 to t) do 

pos[{t X (i- l))+j] := I + (j - 1) + {tx r[j,i]); 

2 .2. I := I + h; 

2.3. i := i + 1; 
end; 

Step 7: sort the vector pos in the ascending order; 

Step 8: for each pos[i], 1 < i < {n x t) — 1, insert one AO at the posfY^ 
between-character location of /. And, at the pos[n x between-character 
location of /, insert two AO’s, to mark then end of the process. 

Time Complexity: 0{n*k) 
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6.2.2 Recovery method 

To recover secret message from a stego PDF file encoded with the above 
procedure, the positions of all the AO’s in the hie must be recover at hrst, 
then remainders that produced those positions, and with the k, computer the 
values related to those remainders, as described by the following procedure: 
Input: f; stego-PDF hie, m: secret key 
Output: s: secret message 

Step 1: compute primes Pi,P 2 , ■''Pt such that: 

2xm<pi<p2<''‘<Pt<3xm 
Step 2: compute the block length k such that: 

k = \log 2 {prod)\ 

where, prod = nliP. 

Step 3: compute the block length in the hie / such that: 

h = t X pt 

Step 4 : retrieve the positions where AO’s have been inserted as described 
below: 

1. i := 1; count := 1; n := 0; exist := true; 

2. get the first couple (a, b) of characters from f; 

3. while (exist and !feof(f)) do 

begin 

if(a != AO and b = AO) then 
begin 

pos[i] := count; 
i := i + fi¬ 
end; 
else 

if (a != AO and b != AO ) then do nothing; 
else 

if (a = AO and b != AO) then count := count - 1; 
else 

if (a = AO and b = AO) then exist := false; 
endif; 
endif; 
endif; 
endif; 
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count := count + 1; 
c := the next character in f; 
a := b; 
b := c; 
end; 

4 . n := (i - 1) / t; 

Step 5: Compute the remainders from the table pos as follows: 

1. I = 0; n = pos.length / t; 

2. for (i := 1 to n) do 

begin 

2.1. for (j := 1 to t) do 

f ■•= \pos[{t x{i- 1)) + j] -{l + j- 1)] mod t; 
r[f, i] ■•= [pos[{t X (i - 1)) + /]-(/ + / - l)]/t; 

2.2. I := I + h; 
end; 

Step 6; Compute each decimal value dec[i] of a block of s such that: 

dec[i] = GarnerAlgorithm{{pi,p 2 , ■ ■ -pt}, ''"[2, i], • • • ,r[t,i]}) 

1 < i < n. 

Step 7: transform each dec[i] in its binary sequence sp[i,j], (1 < j < k) 
bits such that: 


{dec[i ])2 = sp[i, l]sp[i, 2] ■ ■ • sp[i, k] 

k bits 

Step 8: merge all the binary string into one, the secret s, such that: 

s[(i — l)k + j] = sp[i,j], 1 ^ i ^ n,l ^ j ^ k. 

Where Garner Algorithm take as input a list of co-primes Pi,P 2 , ■ ■ - Pt, a 
list of remainders r[l, f], r[2, f], • • • ,r[t,i], and outputs a unique value dec[i]. 
Time Complexity: 0{n* k) 

6.2.3 Evaluation 

Let: 


• p prime 


• 7r{x) the number of prime numbers less or equal to x, 
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• PiiP 2 i'' 'Pt are the prime numbers taken between 2m and 3m. 


In this approach, for each block of length k, t AO’s are inserted in the 
cover file. So to embed a full secret message s divided into n blocks of length 
k, tn AO’s would be needed. This is the result we obtained, resume by the 
following theorem. Regardless the number of blocks we need to embed, an 
additional AO, would be added to allow the recovery method to stop when 
all the hidden bits have been recovered. 


Theorem 3 . 

Given a secret message s, a secret key m, a set of primes pi,P 2 , ■ ■ - Pt taken 
between 2m and 3m, k the block length such that k = \l 0 g 2 11!=!Pd? ^ 

the number of blocks of length k, such that n = ■ The number N of AO’s 

insertions at between-character locations, to perform in a PDF file is given 
by: 


N 


t + 1, if |s| < k 
{t * n) 1, if |s| > k 


On one hand, as t is the number of primes taken between 2m and 3m, 

t = 7r(3m) — 7r(2m) 

And from the work of Hadamard and de la Vallee Poussin |T0] , which resulted 
in the following theorem: 

The Prime Number Theorem [IDj: 

Let 7r{n) denote the the number of primes among 1, 2, ■■■, n. Then, 


We can deduce that: 


3m 2m 

ln{3m) ln{2m) 


( 2 ) 


On the other hand, from estimations of Rosser and Schoenfeld im , we 
have: 
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^9(a:) < x{l + /or 1 < X < 41 

d{x) > x(l - ^), for 41 < X 

We can deduce that: 

^ - rIi) - 2zfe < ^(32^) - ^(23^) < a: + 2lfe + 

It is easy to show that Vx G M, x > e® we have: 

^ - li#) - 2 !^) < - ^(2a:) < a; + 

From these estimations, we deduce that, for x > e®: 


-^x < 'd(3x) — 'd(2x) < j^x. 


Thus, putting m = x: 


2 , 17 

—m < k < — m. 
10 - - 10 


(3) 


From (E]) and ([3]), we can deduce the following corollary. 

Corollary 1 . 

Vm > e®, the number N of AO’s insertions at between-character locations, to 
perform in a PDF file is given by: 


N 


3m 

ln(3m) 

/ 3m 
' ln{3m) 


2m 


+ 1, if |s| 


ln{2m) 


< —m 
- 10 


s > 


m 


6.3 Third Approach 

6.3.1 Hiding method 

Input: s: secret message; k: secret key; f: cover PDF file. 

Output: f: cover PDF file with embedded AO’s 

Step 1 : two co-primes Pi,P 2 , are computed from k such that. 

Pi = 2^1/ p2 = Pi + 1. 

Step 2: s is split in n blocks of length k stored the matrix sp such that: 

■sp[b/] = s[{i — l)k -\- j], 1 < i < n,! < j < k. 

Step 3 : each line of sp corresponding to a binary sequence, is transformed 
in its decimal value dec[i\ such that. 
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dec[i] = Yl^j=i k — j + I]x2^^ 1 <i <n. 

Step 4 ; for each decimal value dec[i] (1 < f < n), two remainders r[l,f] 
and r[2,i], are computed such that 

r[l,f] = dec[i] mod pi and r[2,f] = dec[i] mod p 2 , 1 < f < n 

Step 5 ; for each remainder r[j, f], 1 < f < n, 1 < j < 2, we compute 
positions pos[l, 1], • • • ,pos[2,n], as described by the following procedure: 

1. I := 0; i := 1; n := dec.length; h := 2 x p 2 ; 

2. while (i < n) do 

begin 

2.1. for (j := 1 to 2) do 

pos[2 X {i- l)+j] := I + (j - 1) +2x r[j,i]; 

2.2. I := I + h; 

2.3. i := i + 1; 
end; 

Step 6; sort the vector pos in the ascending order; 

Step 7; for each pos[i], l<f<nx2 — 1, insert one AO at the 
between-character location of /. And, at the pos[n x 2]*^ between-character 
location of /, insert two AO’s, to mark then end of the process. 

Time Complexity; 0{n*k) 

6.3.2 Recovery method 

Input: f; stego-PDF hie, k: secret key 
Output: s: secret message 

Step 1 :two co-primes Pi,P 2 , are computed from k such that. 

Pi = 2^11; p2=pi + 1. 

Step 2; compute the block length in the hie / such that: 

h = 2 X p2 

Step 3 ; retrieve the positions where AO’s have been inserted as described 
below: 
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1 . j := 1; I := h; i := 1; count := 1; n := 0; exist := true; 

2 . get the first couple (a, b) of characters from f; 

3. while (exist and !feof(f)) do 

begin 

if(a != At) and b = AO) then 
begin 

pos[i] := count; 
i := i + 1; 
end; 
else 

if (a != AO and b != AO) then do nothing; 
else 

if (a = AO and b != AO) then count := count - 1; 
else 

if (a = AO and b = AO) then exist := false; 
end; 
endif; 
endif; 
endif; 

count := count + 1; 
c := the next character in f; 
a := b; 
b := c; 
end; 

4 . n := (t- 1) / 2; 

Step 4: Compute the remainders from the table pos as follows: 

1 . I = 0; n = pos.length / 2; 

2 . for (i := 1 to n) do 

begin 

2 .1. for (j := 1 to 2) do 

f ■•= [pos[{2 X (i - 1)) + j] -{l+j- 1)] mod 2; 
r[f, i] ••= [vos[{2 X (f - 1)) + /]-(/ + / - l)]/2; 

2 .2. I := I + h; 
end; 

Step 5: Compute each decimal value dec[i] of a block of s such that: 

dec[i] = GarnerAlgorithm{{pi,P 2 }, {r[l, i], r[2, i]}) 

1 < i < n. 

Step 6: transform each dec[i] in its binary seqnence sp[i,j], (1 < J < k) 
bits snch that: 
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{dec[i ])2 = sp[i, l]sp[i, 2] ■ ■ ■ sp[i, k] 

' -v-' 

k bits 


Step 7; merge all the binary string into one, the secret s, such that: 
s[(i — l)k + j] = sp[i,j], I < i < n,l < j < k. 

Time Complexity; 0{n* k) 


6.3.3 Evaluation 


In this approach, for each block of length k, 2 AO’s are inserted in the cover 
hie. So to embed a full secret message s divided into n blocks of length k, 2n 
AO’s would be needed. Regardless the number of blocks we need to embed, 
an additional AO, would be added to allow the recovery method to stop when 
all the hidden bits have been recovered. The obtained result is resumed by 
the following theorem. 


Theorem 4 . 

Given a secret message s, a secret key k such that number of blocks of length 
k, is given by and two primes pi,p 2 such that pi = 2'^ 2 ^^ P 2 = Pi + 1, 

the number N of AO’s insertions at between-character locations, to perform 
in a PDF file is given by: 


N 


3, if |s| < k 
2n + 1, if |s| > k 


6.4 Fourth Approach 

In this particular approach, there is no need of a secret key. Here, we embed 
only 3 AO’s, at 3 different positions in the cover hie /. Their values, depend 
only on length of the secret message that a sender wants to send through a 
non-secure communication channel. 

6.4.1 Hiding method 

Input: s: secret message; f: cover PDF hie. 

Output: f: cover PDF hie with embedded Ad’s 

Step 1: compute n, the length of the secret message s. 

Step 2 ; insert one AO at the between-character location in the hie /. 
Step 3; compute two co-primes pi,p 2 such that. 
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Pi = 2 L 2 J; p2 = Pi + 1. 


Step 4; transform s in its decimal valne dec snch that, 

dec = -sl*] X 

Step 5; compute two remainders r[l] and r[2] such that, 

r[l] = dec mod pi and r[2] = dec mod p 2 - 

Step 6; for each remainder r[i\ (1 < f < 2), we compute positions pos[l] 
and pos[2] as follows: 

pos[l] = n + 2 * r[l], and pos[2] = n + 2 * r[2] + 1. 

Step 7; embed one AO at pos[lY^ and pos[2]*^ between-character loca¬ 
tions in the hie /. 

Time Complexity; 0{n) 

6.4.2 Recovery method 

Input: f: stego-PDF hie. 

Output: s: secret message 

Step 1 : browse the stego-PDF hie, until we cross the hrst AO, and store 
its position in n. 

Step 2; compute two co-primes pi,p 2 such that. 

Pi = 2L?h p2 = Pi + 1. 

Step 3 ; browse the stego-PDF hie, from the position n, until we cross the 
second AO, store its position in pos[l] and the last AO, and store its position 
in pos[2]. 

Step 4; permute if necessary the values of pos[l] and pos[2] as follows: 
begin 

1. pos[l] .'= pos[l] - n; 

2. pos[2] ;= pos[2] - n; 

3. if pos\}] is odd, permute with pos[2]; 
end; 

Step 5; computes remainders r[l] and r[2] from positions pos[l] and 
pos[2] as follows: 

r[l] = pos[l]/2, and r[2] = {pos[2] — l)/2. 
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Step 6; Compute the decimal value dec such that: 

dec = Garner Algor ithm{{pi,p 2 }, {r[l],r[2]}) 

Step 7: transform dec in its binary sequence s on n bits length such that: 

dec 2 = s[l]s[2] • • • s[n] 

'-V-^ 

n bits 

Time Complexity: 0{n) 

6.4.3 Evaluation 

As with this method, we have the possibility to embed not more or less than 
3 AO’s, no matter how long the message is, we’ve reached the following result. 


Theorem 5 . 

Given a secret message of length n and two primes pi,P 2 such that pi = 2^51 
and p 2 = Pi + 1. The number N of AO’s insertions at between-character 
locations, to perform in a PDF file is given by: 

N = 3 


The proof of this theorem is trivial, regarding the dehnition of the hiding 
method. 


7 Experimental results 

We conducted experiments on our approaches to make sure we reach our goal, 
which is to reduce the insertion of AO's in a PDF file, to maintain a small 
difference between cover and stego PDF hies, while increasing the amount of 
data that can be hidden in that PDF hie serving as cover. 

To have a better view of our results, we’ve chosen as inputs the following: 
secret message s = ’’This is a covert communication method.” (as in 0). 
with nchar = 38 characters and a random PDF hie. For that input / Shi et 
al. inserted 247A0's in a pdf hie. As described by the following table. Note: 
G is Character, F is Freguency, N is the number of AO’s for a character and 
B is Bits. 
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c 

F 

N 

F*N 

LF 

1 

12 

12 


5 

1 

5 

T 

1 

13 

13 

a 

2 

7 

14 

c 

3 

4 

12 

d 

1 

14 

14 

e 

2 

8 

16 

h 

2 

9 

18 

i 

4 

2 

8 


C 

F 

N 

F*N 

m 

3 

5 

15 

n 

2 

10 

20 

o 

4 

3 

12 

r 

1 

15 

15 

s 

2 

11 

22 

t 

3 

6 

18 

u 

1 

16 

16 

V 

1 

17 

17 

Total 

38 


247 


Table 1: Number of AO’s inserted with the method of / Shi et al. 

Regarding our methods, at the beginning we preprocessed the cover file, 
converted the secret message into its binary sequence, where each character 
was replaced by its ASCII code binary representation. 

As we have 38 characters each represented on 8 bits, we would have 304 bits 
to hide in the cover PDF hie. Let’s assume |s|, the total number of bits and 
bin the binary sequence of the secret message s. 


C 

H 

ASCII Code 

c 

H 

ASCII Code 

c 

H 

ASCII Code 

LF 

OA 

00001010 

h 

68 

01101000 

t 

74 

01110100 


20 

00100000 

i 

69 

01101001 

u 

75 

01110101 

T 

54 

01010100 

m 

6D 

01101101 

V 

76 

01110110 

a 

61 

01100001 

n 

6E 

01101110 




c 

63 

01100011 

o 

6F 

01101111 




d 

64 

01100100 

r 

72 

01110010 




e 

65 

01100101 

s 

73 

01110011 





Table 2: ASCII codes of the secret message’s characters 

Where C is Character, H is Hexadecimal (the hexadecimal ASCII code of 
the character) and ASCII Code is the binary ASCII code of the character 

7.1 First approach 

To compute the the number N of inserted AO’s we use Theorem 2, and thus 
we obtain to following results: C is Character, F is Frequency and B is Bits. 
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c 

F 

ASCII Code 

B 

LF 

1 

00001010 

2 


5 

00100000 

5 

T 

1 

01010100 

3 

a 

2 

01100001 

6 

c 

3 

01100011 

12 

d 

1 

01100100 

3 

e 

2 

01100101 

8 

h 

2 

01101000 

6 

i 

4 

01101001 

16 


C 

F 

ASCII Code 

B 

m 

3 

01101101 

15 

n 

2 

01101110 

10 

o 

4 

01101111 

24 

r 

1 

01110010 

4 

s 

2 

01110011 

10 

t 

3 

01110100 

12 

u 

1 

01110101 

5 

V 

1 

01110110 

5 

Total 

38 


136 


Table 3: Number of AO’s inserted. 

In the column B, for each character we computed the number of bits 
having value 1 in its ASCII code, multiplied by the its frequency in the 
secret message s. Thus, one can see that: 

• We’ve obtained a better result compare to results obtained with the 
method of / Shi et al: N < 247 AO’s 

• We ensured the fact that the number of inserted AO’s is lower than the 
number of bits of s: N < |s|. 

Note that the value 136 represents the maximum number of AO’s that can 
be inserted in a cover PDF hie, given the secret message taken as example 
in this study. 

7.2 Second approach 

To compute the number N of inserted AO’s, we use the Corollary 1, by 
replacing |s| by its value and k by its equation ([3]). Thus: 

+ tf k| < 

+ if |s| > 

And as the number of AO’s depends on m, we vary the value of m to see 
where its optimal value stands. Here are some of the obtained results: 

In this approach, the block length k is not the secret key, but is com¬ 
puted from m which is. And even the set of prime numbers used to compute 
remainders is generated from it. 


N 


3m 

ln(3m) 

/ 3m 
^ /n(3m) 
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m 

k 

t 

n 

n*t 

2 

3 

1 

102 

102 

12 

20 

3 

16 

48 

22 

37 

5 

9 

45 

32 

54 

6 

6 

48 

42 

71 

8 

5 

45 

52 

88 

10 

3 

30 

62 

105 

12 

3 

26 

72 

122 

13 

3 

39 

82 

139 

14 

2 

28 


m 

k 

t 

n 

n*t 

92 

156 

16 

2 

32 

102 

173 

17 

2 

34 

112 

190 

18 

2 

36 

122 

207 

19 

2 

38 

132 

224 

21 

2 

42 

142 

241 

22 

2 

44 

152 

258 

23 

2 

48 

162 

275 

24 

2 

48 

172 

292 

25 

2 

50 


m 

k 

t 

n 

n*t 

179 

304 

25 

1 

25 

182 

309 

27 

1 

27 

192 

326 

28 

1 

28 

202 

343 

29 

1 

29 

212 

360 

30 

1 

30 

222 

377 

31 

1 

31 

232 

394 

32 

1 

32 

242 

411 

34 

1 

34 

252 

428 

35 

1 

35 


Table 4: Number of AO’s {N = n * t), given the number of primes t and 
number of blocks n, both obtained from m 

By varying the different values of m, we came up we a certain number of 
curves. 



Figure 2: Evolution of the number t, of prime numbers with respect to k 

This curve shows the growth of t with respect to k (or m). We can see 
that, the more k grows, the more the number of prime numbers that would 
used in the computation of AO’s grows. And as each prime generates one AO, 
the number of AO’s grows too. 

Then, we generated a curve, showing that, the more k gets closed to |s|, 
the more n, the number of blocks, decreases until it reaches the value 1; 
where it remains constant no matter the value k (for k < |s|). 

After having computed for each value of m, the block length k, the number 
of primes t and the number of block n of the secret message s, we generated 
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Figure 3: Evolution of the number of block with respect to k 


a curve showing the growth of N the number of AO’s that would be use to 
encode the secret message s, with respect to k (or m). 



Figure 4: Evolution of the number of AO’s with respect to k 

One can see that, when k gets superior to |s|, N the number of AO’s 
depends now on the number of primes t. Meaning that, the more t grows the 
more the N grows. Where f’s growth is a consequence of the growth of s, as 
shown by the hrst curve. 

And for a value of k taken between 1 and |s|, the value of fluctuate, mak¬ 
ing it difficult to choose the right value of the key m, that lowers the number 
of inserted AO’s. But compare to the result of / Shi et al. for a value of 
k G [1, |s|], the max value (this is when fc = 1) is less than the half of value 
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(247A0’s) they’ve obtained. 

Also, one can see that the optimal value of N can be reached for k G 
[||s|, ||s|]. For that, N < |s|, and there are certain cases {k G [92,102] and, 
k G [182,222]) where N gets lower than the number of characters of the 
secret message s, which is hard to generalize. 


7.3 Third approach 


To compute the number N of inserted AO’s, we use the Theorem 3, where : 


N 


3, if |s| < k 
2n + 1, if |s| > k 


And as the number of AO’s depends on k, we vary the value of k to see where 
its optimal value stands. Here are some of the obtained results: 


k 

Value of N 

1 

609 

2 

305 

3 

205 

16 

39 

|s|/4 

9 

|s|/2 

5 

3»|s|/4 

5 

s 

3 

5* s /4 

3 

3* s /2 

3 


Table 5: Number of AO’s with respect of k 


From the above operations, whose some of the results are represented by the 
hgure below, we can see that: 

• For k < 3, N > 247 AO's > |s|. Which is not a good situation; 

• For k = 3, N = 205 < 247 AO's and N < |s|. Meaning, from here we 
inserted less AO’s than with the method of / Shi et ai; 

• For k > 16, N < 247 AO's and N < |s|. From this point, N starts to 
get lower than the number of characters of s. As for k = 16, we have 
N = 39, which is exactly the number of Characters contained in s. 
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380 456 
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Value of the secret key k 


Figure 5: Evolution of the number of AO’s with respect to the key k 


• For 152 < A; < |s|, iV = 5. Meaning that at this point, the weight 
difference between the cover and the stego hie is almost invisible; 

• For A: > |s|, = 3. N remains constant no matter the value of the k. 

So, to ensure that a minimum number of AO’s would be inserted in a 
cover PDF hie, the sender and the receiver, should agree on a secret key with 
high value. 

7.4 Fourth approach 

First of all, compute the two remainders pi, p 2 that would help us to compute 
positions. 



Then, convert the bin in its decimal value dec and compute three positions 
where one AO would be inserted in the PDF. Those positions are: 

• First position: pos[0] = |s| = 304 

• Second position: pos[l] = 2 * {dec mod pi) + |s| 

• Third position: pos[2] = 2 * {dec mod P 2 ) + |s| 

Whatever the values of the computed positions, only 3 AO's will be inserted. 
One can conclude that: 
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• The weight difference between the cover hie and the stego hie is 3 bytes; 

• The number N of inserted AO’s is far smaller than the number inserted 
using / Shi et al. method; 

• We ensured the fact that the N < |s|. 

We can resume our results, for the chosen secret message of 38 characters, 
as shown by the following table : 



I-Shi et al. 

D* case 



case 

N 

247 

138 

e [25,38[ 

> 3 

3 

files 

2 

1 

1 

1 

1 


Table 6: Comparison of methods 

With these experiments we’ve shown the ehectiveness and the correctness 
of our approaches. 

8 Discussion 

From the our results obtained, expressed in the previous section, we came 
up with some observations, regarding the choice of a secret key, to embed a 
secret message s, in a cover PDF hie. 

The number of signs that can be contained in a document page is closed 
to 1500. Where a sign can be, space, punctuation, apostrophes, etc. Thus, 
the number of between-character locations in that page is close to 1500 < 

1500 < 2^^). It implies that: 

• In the hrst and fourth approaches: for each pi multiple of 1500, that is 
to say that pi = 1500 * a (a > 1, 1 < i < 2), we would need a page(s) 
to hide the number of AO’s generated by pi. 

• In the second and third approaches: h = t x pt, where h is the number 
of between character locations used to hide AO’s generated by t prime 
numbers, and t in the third approach equals 2. for h multiple 1500, 
that is to say that h = 1500 * a (a > 1), we would need a pages to 
hide a block of the secret message s. 

Thereby, the more h or p* is high, the more we would need a cover PDF 
hie with a high number of pages to embed our secret message. And here, the 
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amount of embbedable information would depend on the approach selected 
for the purpose. Our approaches can be optimized even more, by using a 
compression algorithm on the secret message as done in [6j. 

The advantage of our method is that it would be difficult to detect the 
integration of secret information in the cover hie, while the inconvenient is 
that the hle’s number of pages can grow exponentially as it depends on h or 
Pi- 


9 Conclusion 

A novel approach of PDF steganogaphy is proposed based on the Chinese 
Remainder Theorem. In this paper we presented four different techniques 
whose purpose is to increase the amount of information that can be hidden 
in a cover PDF hie, while reducing considerably the number of AO’s inser¬ 
tions at between-character locations in that hie, thus reducing the weight 
diherence between a cover hie and a stego hie in which a secret message 
is embedded. We did this, by ensuring that the number of embedded AO’s 
would be less than the number of characters of s or at least if s grows higher, 
the number of inserted AO’s won’t explode. Experimental results show the 
feasibility of the proposed methods and parameters to attain an optimal ef- 
hciency had been exposed. Further researches may be directed to improve 
these methods, and also to applying the data hiding scheme to other appli¬ 
cations like watermarking for copyright protection, authentication of PDF 
hies, etc. 
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